Telepresence system with automatic preservation of user head size

ABSTRACT

A method and system for mutually immersive telepresencing are provided. A user is viewed at a user&#39;s location to provide a user&#39;s image. The size of the user&#39;s head is determined in the user&#39;s image. A surrogate having a surrogate&#39;s face display about the size of the user&#39;s head is provided. The user&#39;s image is processed based on the size of the surrogate&#39;s face display to provide an about life-size image of the user&#39;s head. The about life-size image is displayed on the surrogate&#39;s face display.

CROSS-REFERENCE TO RELATED APPLICATION

[0001] The present application contains subject matter related toco-pending U.S. patent application Ser. No. 09/784,902 and publicationnumber US 2002/0118861 A1 by Norman Jouppi and Subramonium Iyer entitled“Head Tracking and Color Video Acquisition via Near infrared LuminanceKeying”.

BACKGROUND TECHNICAL FIELD

[0002] The present invention relates generally to videoconferencing andmore specifically to telepresence systems.

BACKGROUND ART

[0003] Originally, video camera and audio systems were developed forimproving communication among individuals who are separated by distanceand/or time. The system and the process are now referred to as“videoconferencing”. Videoconferencing sought to duplicate, to themaximum extent possible, the full range, level and intensity ofinterpersonal communication and information sharing which would occur ifall the participants were “face-to-face” in the same room at the sametime.

[0004] Behavioral scientists know that interpersonal communicationinvolves a large number of subtle and complex visual cues, referred toby names like “eye contact” and “body language,” which provideadditional information over and above the spoken words and explicitgestures. These cues are, for the most part, processed subconsciously bythe participants, and often communicate information that cannot becommunicated in any other fashion.

[0005] In addition to spoken words, demonstrative gestures, andbehavioral cues, face-to-face contact often involves sitting down,standing up, and moving around to look at objects or charts. Thiscombination of spoken words, gestures, visual cues, and physicalmovement significantly enhances the effectiveness of communication in avariety of contexts, such as “brainstorming” sessions amongprofessionals in a particular field, consultations between one or moreexperts and one or more clients, sensitive business or politicalnegotiations, etc. In situations where the participants cannot be in thesame place at the same time, the beneficial effects of face-to-facecontact will be realized only to the extent that each of the remotelylocated participants can be “recreated” at each site.

[0006] Although videoconferencing has come into widespread use, it isstill of limited use because of the inability to very closelyapproximate for a user the recreation of the remotely locatedparticipants. The systems generally use fixed-location cameras andconference-type telephones. There is no sense of the presence of theuser being at the site of a remote meeting or of the presence of theremotely located participants being with the user.

[0007] To overcome these problems, a system called “robotictelepresence” has been developed. In robotic telepresence, a remotelycontrolled robot simulates the presence of the user for the remotelylocated participants. The user has a freedom of motion and control overthe robot and video input that is not present in traditionalvideoconferencing, and this better simulates the feeling of the userbeing present in person at a remote site. The overall experience for theuser and the people interacting with the robotic telepresence device isvery much superior to videoconferencing.

[0008] The robot platform typically includes a camera, a display device,a motorized platform that includes batteries, a control computer, and awireless computer network connection. An image of the user is capturedby a camera at the user's location and displayed on the display of therobotic telepresence device in the remote site.

[0009] More recently, a robotic telepresence system has been developed,which has a user station at a first geographic location and a robot at asecond geographic location. The user station is responsive to a user andcommunicates information to and from the user. The robot is coupled tothe user station and provides a three dimensional representation of theuser transmitted from the user station. The robot also sensespredetermined types of information and communicates the sensedinformation back to the user to provide a representation for the user ofthe robot's surroundings.

[0010] Additionally, a system has been developed for head tracking andcolor video acquisition via near-infrared luminance keying where thehead of a user is tracked in real time. A near-infrared camera isequipped with filters that discern the difference between anear-infrared light illuminated rear projection screen behind the userand any foreground illumination to acquire a near-infrared image of theuser. A color image of the user's head and the projection of a remotelocation are acquired by a color camera placed in close proximity to thenear-infrared camera. A bounding box is placed around the near-infraredimage of the user's head and translated to the view space of the colorcamera. The translated image is used to crop the color image of theuser's head for transmission to the remote location.

[0011] However, there are many problems that still need to be addressedto provide improved robotic telepresence realism; i.e., to make the userappear to be present in person.

[0012] Solutions to problems of this sort have been long sought, buthave long eluded those skilled in the art.

DISCLOSURE OF THE INVENTION

[0013] The present invention provides a method and system for mutuallyimmersive telepresencing. A user is viewed at a user's location toprovide a user's image. The size of the user's head is determined in theuser's image. A surrogate having a surrogate's face display about thesize of the user's head is provided. The user's image is processed basedon the size of the surrogate's face display to provide an aboutlife-size image of the user's head. The about life-size image isdisplayed on the surrogate's face display. This provides a means to moreclosely simulate the feeling of the actual presence of a user duringvideoconferencing with a life-size image presented on the display.

[0014] Certain embodiments of the invention have other advantages inaddition to or in place of those mentioned above. The advantages willbecome apparent to those skilled in the art from a reading of thefollowing detailed description when taken with reference to theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015]FIG. 1 is an overview of a Mutually-Immersive Mobile TelepresenceSystem;

[0016]FIG. 2 is a view of the surrogate in accordance with the presentinvention;

[0017]FIG. 3 is a view of the user's location in accordance with thepresent invention;

[0018]FIG. 4 is a view from one of the cameras mounted beside the user'sdisplay in accordance with the present invention;

[0019]FIG. 5 is a mode of preserving head size of a user on a surrogatein accordance with the present invention; and

[0020]FIG. 6 is a method for mutually immersive telepresencing inaccordance with the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

[0021] The present invention relates to a Mutually-Immersive MobileTelepresence (E-Travel) System. The user sits in front of a displayshowing the remote location, and a robot device is located at a remotelocation having a display of the user. Video and audio are transmittedbetween the display and the robot device. The robot device may have ahumanoid as well as a non-humanoid shape, and is referred to as a“surrogate”.

[0022] Behavioral scientists know that interpersonal communicationinvolves a large number of subtle and complex visual cues, referred toby names like “gaze” and “eye contact,” which provide additionalinformation over and above the spoken words and explicit gestures. Gazerelates to others being able to see where a person is looking and eyecontact relates to the gazes of two persons being directed at the eyesof the other. These cues are, for the most part, processedsubconsciously by the people, and often communicate vital information.

[0023] In situations where all the people cannot be in the same place atthe same time, the beneficial effects of face-to-face contact will berealized only to the extent that a remotely located person, or “user”,can be “recreated” at the site of the meeting where the “participants”are present.

[0024] It has been discovered by the inventor during experimentationwith various robotic telepresence systems that people are used torelating to other people whose heads are roughly the same size as theirown. People use the fact that most adult heads are roughly the same sizein a number of ways.

[0025] First, it makes identifying distances to people easier. Forexample, if a person's face were presented in a larger-than-life size, aviewer would say the person is “in your face”. Similarly, if a person'shead were presented in a significantly smaller-than-life size, somepeople would describe the person as being more “distant” than if theyhad viewed the person at a scale matching real life.

[0026] Second, such phrases as “big headed” and “small headed” havenegative connotations. Studies have shown that users associate artifactsin the presentation of people's images with shortcomings in the peoplethemselves.

[0027] Third, young people's heads appear smaller than adult heads.Thus, a smaller appearing head creates the impression of being younger.

[0028] Fourth, having a person's face presented at near life-size aidsin the identification of facial expressions and an accurate perceptionof gaze.

[0029] Fifth, changes in the orientation of a user's head in atelepresence system should not change the perceived size of the user'shead. For example, in one of the inventor's original telepresencehead-tracking systems, the display scaled the user's head to fit andfill the surrogate's face display panels. Thus, if the user's headtilted forward increasing the head's virtual width, the original systemwould shrink the user's head so that it would continue to fit.Similarly, if the user's head turned to the left or right, the user'shead would sometimes shrink or expand. It has been discovered that thesebehaviors are undesirable and do not occur if the user's head size isrecreated accurately.

[0030] Finally, it has been discovered that to immersively create theperception that a user is physically present at the surrogate'slocation, it is necessary to present the user's head at the same size asif the user were physically present.

[0031] It has also been discovered that besides accurately creating thesize of the user's head, the head image must be pleasingly positioned onthe surrogate's face display. This has been found to be extremelydifficult since the tilting of the user's head or large hairstyles maymake the user's head larger than can be displayed on the surrogate'sface displays.

[0032] Thus, it has been found to be desirable to preserve the actualhead size of a user by displaying the user's head with the same width,height, and length as if the user were physically present. It has alsobeen found to be desirable to present the user's head in a visuallypleasing position on the surrogate's face display, while only requiringmodest amounts of video manipulation and computation to do so.

[0033] Referring now to FIG. 1, therein is shown a Mutually-ImmersiveMobile Telepresence System 100. The system 100 includes a user's display102 at a user's location 104 and a robotic device or a surrogate 106 ata surrogate's location 108.

[0034] A user 110 may sit in a chair 114 or stand with the user's head111 and the user's face 113 facing the user's display 102 on which animage of the surrogate's surroundings may be back-projected from aprojector 115. The surrogate 106 is connected to the user's display 102via a high-speed network 112 through a user's transceiver-computersystem 116.

[0035] First and second camera sets 118 and 120 are set a the corners ofthe user's display 102 to view the user 110 and transmit an image of theuser's face 113 to the surrogate 106.

[0036] Referring now to FIG. 2, therein is shown the surrogate 106 inaccordance with the present invention. The surrogate 106 has asurrogate's head 202 made with one or more surrogate's face displays204, which could be made of one or more liquid crystal display (LCD)panels.

[0037] One or more surrogate's cameras 206 in the surrogate's head 202capture live video images at the surrogate's location 108. The imagesfrom the surrogate's cameras 206 in the surrogate's head 202 arecompressed and transmitted over the high-speed network 112 by asurrogate's transceiver-computer system 207 in the surrogate 106 to theuser's transceiver-computer system 116 (shown in FIG. 1) at the user'slocation 104.

[0038] The surrogate 106 is made in two parts that are movable relativeto each other over a distance 205. One part is a leg portion 208 and onepart is a torso portion 210. A monitor 209 is connected to thesurrogate's transceiver-computer system 207 to sense the extension orheight of the torso portion 210 relative to the leg portion 208. Thesurrogate's head 202 is mounted above the torso portion 210, and thetorso portion 210 may be raised or lowered relative to the leg portion208 so as to raise or lower the surrogate's head 202 relative to thesurface on which the surrogate 106 moves or is moved. The surrogate 106includes a drive portion 212, which permits movement of the surrogate106.

[0039] In the present invention, an image of the user's head 111 (ofFIG. 1) must be acquired in a way in which the scale of the image isknown for display as a head image 211. It is not enough to assume thatall people have the same head size, for several reasons. Depending on aperson's hairstyle (ranging from a shaved head for men to a bouffant orbeehive hairdo for women), the actual size of a person's head may varygreatly. Also, children have smaller heads than adults. Further, usersare more comfortable when they have a reasonable range of freedom ofmovement and are not constrained to sit or stand in a preciselypositioned location while using the system 100 for long periods of time.The user 110 will thus be free to move closer or further away from thefirst and second camera sets 118 and 120 of FIG. 1, the scale of theuser's image is not known a priori.

[0040] Also, once the scale of the user's image is known, the head image211 must be shown upon the surrogate's face displays 204 at life-size.The head image 211 of the user's head 111 must also be positioned withinsurrogate's face displays 204 in the most pleasing manner withrelatively little image manipulation and computation. This means thepositioning of the head image 211 should be stable and devoid of jitterand other artifacts.

[0041] To determine the position of the user's head 111 in X, Y, and Zcoordinates relative to the first and second camera sets 118 and 120,several techniques may be used. Conventionally known near-infrared (NIR)difference keying or chroma-key techniques may be used with camera sets,which may be combinations of near-infrared or video cameras.

[0042] Referring now to FIG. 3, therein is shown the user's location 104looking down from above. In this embodiment, the first and second camerasets 118 and 120 are used as an example. The distance x between thefirst and second camera sets 118 and 120 is known, as are angles h₁ andh₂ between centerlines 302 and 304 of sight of the first and secondcamera sets 118 and 120, and centerlines 306 and 308 respectively to theuser's head 111. It is also known that the first and second camera sets118 and 120 have the centerlines 302 and 304 set relative to each other;e.g., 90 degrees. If the first and second camera sets 118 and 120 areangled at 45 degrees relative to the user's display 102, the anglesbetween the user's display 102 and the centerlines 306 and 308 to theuser's head 111 are s₁=45−h₁ and s₂=45+h₂. From trigonometry:

x ₁*tan s ₁ =y=x ₂*tan s ₂  Equation 1

and

x ₁ +x ₂ =x  Equation 2

so

x ₁*tan s ₁=(x−x ₁)*tan s ₂  Equation 3

regrouping

x ₁*(tan s ₁+tan s ₂)=x*tan s ₂  Equation 4

solving for x₁

x ₁=(x*tan s ₂)/(tan s ₁+tan s ₂)  Equation 5

[0043] and knowing either x₁ or x₂, compute y.

[0044] (To reduce errors, compute y 310 from both and take the averagevalue.)

[0045] Then the distances from each camera to the user can be computedas follows:

d ₁ =y/sin s ₁  Equation 6

d ₂ =y/sin s ₂  Equation 7

[0046] Referring now to FIG. 4, therein is shown a user's image 400 fromeither the first and second camera sets 118 or 120 mounted beside theuser's display 102 used in determining the user's head height.

[0047] The combination of camera and lens determines the overallvertical (f_(v)) and horizontal (f_(h)) fields of view of the user'simage 400. Based on this and the position of the user's head 111 in thefield of view, the horizontal (h) and vertical (v) angles can becomputed by a processor between the top center of the user's head 111and an optical center 402 of the user's image 400. From this, the heightH of the user's head 111 above a floor can be computed.

[0048] Once the distance to the user's head 111 from each of the camerasets 118 and 120 is known, the scale of the user's head 111 in terms ofa linear measure per angular percentage of the camera's field of viewcan be determined to provide the scale of the head image 211 in FIG. 2and to preserve head size.

[0049] For example, the size of the user's head 111 could be about oneinch per 3% of the camera's field of view f_(h). Since the surrogate'stransceiver-computer system 207 (in FIG. 2) knows the width of thesurrogate's face displays 204 (for example, about 10 inches wide), then30% of the width of the user's image 400 should be displayed to maintainthe head image 211 at life-size on a ten-inch wide display.

[0050] Referring now to FIG. 5, therein is shown a mode of preservinghead size of the head image 211 of FIG. 2.

[0051] If the distance to the user's head d_(u) is 48 inches, and thehorizontal field of view f_(h) (in FIG. 4) of the camera's lens is 40degrees, from trigonometry, one inch perpendicular to the distancevector d_(u) would subtend an angle of arctan (1/48)=1.193 degrees atthe position of the first camera set 118. Since the camera's field ofview is 40 degrees, each inch of the user's head 111 must subtend100*(1.193/40)=2.98% of the horizontal width of the user's image 400.

[0052] Once the scale for displaying the head image 211 of the user'shead 111 on each of the surrogate's face displays 204 are known, it isnecessary to compute how to position it on the surrogate's face displays204 of FIG. 2.

[0053] It has been discovered that presenting the head image 211 withthe user's face 113 (of FIG. 1) in a classic portrait style similar tothat found in high-school yearbooks is generally found to be attractiveand visually pleasing.

[0054] It has also been discovered if the width of the head image 211fits in the surrogate's face display 204, the head image 211 will behorizontally centered. Then, it has been found that setting the verticalposition so that there is about one inch of background between the headimage 211 and the top edge of the surrogate's face display 204 willprovide a visually pleasing image. This is much more visually pleasingthan having the head image 211 abutting the top of the surrogate's facedisplay 204.

[0055] If the size of the head image 211 is wider than the surrogate'sface display 204 (which occurs only with very large hairstyles), it hasbeen discovered that it is necessary to crop the head image 211 based onthe following general principles. First, it is necessary to measure theorientation of the user's head 111. This orientation can be determinedby using body orientation as determined using computer relatedcomponents. The computer related components could be a position sensorand a position/orientation measuring system, such as the PolhemusFastrak available from Polhemus Incorporated of Colchester, Vt. 05446,which is capable of providing dynamic, real-time, six degree-of-freedommeasurement of position (X, Y, and Z Cartesian coordinates) andorientation (azimuth, elevation, and roll).

[0056] For example, if the user 110 (of FIG. 1) is facing within 45degrees towards the first camera set 118, both sides of the head image211 are cropped evenly. If the head image 211 is closest to a profileorientation, the backside of the head image 211 is cropped (i.e.,removing some hair but keeping the entire user's face 113). If the backof the user's head 111 is towards the first camera set 118, each side ofthe head image 211 is cropped equally.

[0057] If the head image 211 is taller than the surrogate's face display204 (again, usually only in cases of extreme hairstyles), the top andbottom of the head image 211 is cropped equally. This reduces thedisplay of the top of the user's hair and neck approximating a closeshot common in movies and television.

[0058] Finally, it has been discovered that it is useful toexponentially time weigh average the user's head 111 positions and sizesobtained above over about a second of time so that the users can nodtheir heads, shift position, scratch their nose, etc. without having thesystem 100 (of FIG. 1) go through the processing required to removevisible gestures from the surrogate's face display 204.

[0059] Referring now to FIG. 6, therein is shown a method 600 formutually immersive telepresencing in accordance with the presentinvention. The method 600 includes: a step 602 of viewing a user at auser's location to provide a user's image; a step 604 of determining thesize of the user's head in the user's image; a step 606 of providing asurrogate having a surrogate's face display about the size of the user'shead; a step 608 of processing the user's image based on the size of thesurrogate's face display to provide an about life-size image of theuser's head; and a step 610 of displaying the about life-size image onthe surrogate's face display.

[0060] While the invention has been described in conjunction with aspecific best mode, it is to be understood that many alternatives,modifications, and variations will be apparent to those skilled in theart in light of the aforegoing description. Accordingly, it is intendedto embrace all such alternatives, modifications, and variations, whichfall within the spirit and scope of the included claims. All mattershither-to-fore set forth herein or shown in the accompanying drawingsare to be interpreted in an illustrative and non-limiting sense. Theinvention claimed is:

1. A method for mutually-immersive telepresencing comprising: viewing auser at a user's location to provide a user's image; determining thesize of the user's head in the user's image; providing a surrogatehaving a surrogate's face display about the size of the user's head;processing the user's image based on the size of the surrogate's facedisplay to provide an about life-size image of the user's head; anddisplaying the about life-size image on the surrogate's face display. 2.The method as claimed in claim 1 wherein: determining the size of theuser's head includes determining the location of the user's head at theuser's location.
 3. The method as claimed in claim 1 wherein:determining the size of the user's head in the user's image includesdetermining a scale of the user's head; displaying the about life-sizeimage in a classic portrait style.
 4. The method as claimed in claim 1wherein: processing the user's image includes cropping to provide aclose-up image of the face of the user.
 5. The method as claimed inclaim 1 wherein: displaying the about life-size image includesexponential time weighted averaging of a plurality of the user's imagesto display the about life-size image.
 6. A method for mutually-immersivetelepresencing comprising: viewing a user at a user's location toprovide a user's image; determining the size of the user's head in theuser's image using a distance of the user's head from where the user isviewed and a width of the user's head in the image; providing asurrogate having a surrogate's face display about the size of the user'shead; processing the user's image based on the size of the surrogate'sface display to provide an about life-size image of the user's head; anddisplaying the about life-size image on the surrogate's face display. 7.The method as claimed in claim 6 wherein: determining the size of theuser's head includes determining the location and orientation of theuser's head at the user's location.
 8. The method as claimed in claim 6wherein: determining the size of the user's head in the user's imageincludes determining a scale of the user's head; and displaying theabout life-size image in a classic portrait style with a clearancebetween a top of the surrogate's face display and the about life-sizeimage.
 9. The method as claimed in claim 6 wherein: determining the sizeof the user's head in the user's image includes determining the locationand orientation of the user's head at the user's location; andprocessing the user's image includes cropping to provide a close-upimage of the face of the user selected from a group consisting of:cropping both sides of the about life-size image of the user's head whenviewing the user from within 45 degrees on either side of the face ofthe user, cropping the backside of the about life-size image of theuser's head when viewing a profile of the user, cropping both sides ofthe about life-size image of the user's head when viewing the back ofthe user's head, and cropping both top and bottom of the about life-sizeimage of the user's head when viewing the user's head providing alife-size image of the user's head which is taller than will fit in thesurrogate's face display.
 10. The method as claimed in claim 6 wherein:displaying the substantially life-size image includes exponential timeweighted averaging of head position and scale computed from a pluralityof the user's images before displaying the about life-size image.
 11. Asystem for mutually-immersive telepresencing comprising: a camera setfor viewing a user at a user's location to provide a user's image; acomputer for determining the size of the user's head in the user'simage; a surrogate having a surrogate's face display about the size ofthe user's head; and a processor for processing the user's image basedon the size of the surrogate's face display to provide an aboutlife-size image of the user's head and for displaying the aboutlife-size image on the surrogate's face display.
 12. The system asclaimed in claim 11 wherein: the computer has components for determiningthe location of the user's head at the user's location.
 13. The systemas claimed in claim 11 wherein: the computer includes means fordetermining a scale of the user's head; the surrogate's face displaydisplays the about life-size image in a classic portrait style.
 14. Thesystem as claimed in claim 11 wherein: the processor includes means forcropping to provide a close-up image of the face of the user.
 15. Thesystem as claimed in claim 11 wherein: the processor includes means forexponential time weighted averaging of a plurality of the user's imagesto display the about life-size image.
 16. A system formutually-immersive telepresencing comprising: a camera set for viewing auser at a user's location to provide a user's image; a computer fordetermining the size of the user's head in the user's image using adistance of the user's head in the image from where the user is viewedand a width of the user's head in the image; a surrogate having asurrogate's face display about the size of the user's head; and aprocessor for processing the user's image based on the size of thesurrogate's face display to provide an about life-size image of theuser's head and for displaying the about life-size image on thesurrogate's face display.
 17. The system as claimed in claim 16 wherein:the computer has components for determining the location and orientationof the user's head at the user's location.
 18. The system as claimed inclaim 16 wherein: the computer includes means for determining a scale ofthe user's head; and the processor includes means for displaying theabout life-size image in a classic portrait style with a clearancebetween a top of the surrogate's face display and the about life-sizeimage.
 19. The system as claimed in claim 16 wherein: the computer hascomponents for determining the location and orientation of the user'shead at the user's location; and the processor includes means forcropping to provide a close-up image of the face of the user selectedfrom a group consisting of: both sides cropped of the about life-sizeimage of the user's head when the camera set views the user from within45 degrees on either side of the face of the user, the backside croppedof the about life-size image of the user's head when the camera setviews a profile of the user, both sides cropped of the about life-sizeimage of the user's head when the camera set views the back of theuser's head, and both top and bottom cropped of the about life-sizeimage of the user's head when the camera set views the user's headproviding a life-size image of the user's head which is taller than willfit in the surrogate's face display.
 20. The system as claimed in claim16 wherein: the processor includes means for exponential time weightedaveraging of head position and scale computed from a plurality of theuser's images before displaying the about life-size image.